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Abstract 

We present a novel adaptive random subspace 
learning algorithm (RSSL) for prediction pur¬ 
pose. This new framework is flexible where it 
can be adapted with any learning technique. In 
this paper, we tested the algorithm for regression 
and classification problems. In addition, we pro¬ 
vide a variety of weighting schemes to increase 
the robustness of the developed algorithm. These 
different wighting flavors were evaluated on sim¬ 
ulated as well as on real-world data sets con¬ 
sidering the cases where the ratio between fea¬ 
tures (attributes) and instances (samples) is large 
and vice versa. The framework of the new algo¬ 
rithm consists of many stages; first, calculate the 
weights of all features on the data set using the 
correlation coefficient and F-statistic statistical 
measurements. Second, randomly draw n sam¬ 
ples with replacement from the data set. Third, 
perform regular bootstrap sampling (bagging). 
Fourth, draw without replacement the indices of 
the chosen variables. The decision was taken 
based on the heuristic subspacing scheme. Fifth, 
call base learners and build the model. Sixth, use 
the model for prediction purpose on test set of 
the data. The results show the advancement of 
the adaptive RSSL algorithm in most of the cases 
compared with the synonym (conventional) ma¬ 
chine learning algorithms. 

1. Introduction 

Given a dataset ^ = {z,- = (x^^jy ,i = 1, • • • ,«}, where 
X; = (x,i, • • • , x,p)^ S C IR’’ and y,- S are realizations 
of two random variables X and Y respectively, we seek to 
use the data ^ to build estimators / of the underlying func¬ 


tion / for predicting the response Y given the vector X of 
explanatory variables. In keeping with the standard in sta¬ 
tistical learning theory, we will measure the predictive per¬ 
formance of any given function / using the theoretical risk 
functional given by 

^(/)=E[£(T,/(Y))]= / £(x,y)c/P(x,y), (1) 

J SCy-V 

with the ideal scenario corresponding to the universally 
best function defined by 

J* = arginf = arginf {E[£(T,/(Y))]} . (2) 

For classification tasks, the most commonly used loss func¬ 
tion is the zero-one loss i{Y,f{X)) = l{y//(x)}> for which 
the theoretical universal best defined in (2) is the Bayes 
classifier given by/* (x) = argmax {Pr[T = y|x]}. Forre- 

gression tasks, the squared loss i{Y,f(X)) = {Y — f{X))^ 
is by far the most commonly used, mainly because of the 
wide variety of statistical, mathematical and computational 
benefits it offers. For regression under the squared loss, 
the universal best defined in (2) is also known theoreti¬ 
cally known to be the conditional expectation of Y given X, 
specifically given by /*(x) = E[T|Y = x]. Unfortunately, 
these theoretically expressions of the best estimators can¬ 
not be realized in practice because the distribution function 
P(x,y) of {X,Y) defined on x is unknown. To cir¬ 
cumvent this learning challenge, one has to do essentially 
two foundational thing, namely; (a) choose a certain func¬ 
tion class ^ (approximation) from which to search for the 
estimator / of the true but unknown underlying /, (b) spec¬ 
ify the empirical version of (1) based on the given sample 
an use that empirical risk as the practical objective func¬ 
tion. However, in this paper, we do not directly construct 
our estimating classification functions from the empirical 
risk. Instead, we build the estimators using other optimal¬ 
ity criteria, and then compare their predictive performances 
using the average test error AVTE(-), namely 


( 3 ) 
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where fr{-) is the r-th realization of the estimator /(•) built 
using the training portion of the split of ^ into training set 
and test set, and is the f-th observation from the 

test set at the r-th random replication of the split of In 
this paper, we consider both multiclass classification tasks 
with response space ^ = - ,G} and regression tasks 

with = R, and we focus on learning machines from a 
function class ^ whose members are ensemble learners in 
the sense of Definition (1). In machine learning, in order 
to improve the accuracy of a regression function, or a clas¬ 
sification function, scholars tend to combine multiple es¬ 
timators because it has been proven both theoretically and 
empirically (Turner & Ghosh, 1995; Turner & Oza, 1999) 
that an appropriate combination of good base learners leads 
to a reduction in prediction error. This technique is known 
as ensemble learning (aggregation). In spite of the underly¬ 
ing algorithm used, the ensemble learning technique most 
of the time (on average) outperforms the single learning 
technique, especially for prediction purposes (van Wezel & 
Potharst, 2007). There are many approaches of perform¬ 
ing ensemble learning. Among these, there are two popu¬ 
lar ensemble learning techniques, bagging (Breiman, 1996) 
and boosting (Freund, 1995). Many variants of these two 
techniques have been studied previously such as random 
forest (Breiman, 2001) and AdaBoost (Freund & Schapire, 
1997) and applied in a prediction problem. Our proposed 
method belongs to the subclass of ensemble learning meth¬ 
ods known as random subspace learning. 

Definition 1 Given an ensemble = {h\,h 2 , ■ ■ ■ ,/!/.} of 
base learners hi : —> ‘3^, with relative weight at € R*\_ 
(usually ai S (0, 1) for convex aggregation), the ensemble 
representation of the underlying function f is given by 
the aggregation (weighted sum) 

= (4) 

/=! 

A question naturally arises as to how the ensemble 
is chosen, and how the weights are determined. Boot¬ 
strap Aggregating also known as bagging (Breiman, 1996), 
boosting (Freund & Schapire, 1996), random forests 
(Breiman, 2001), and bagging with subspaces (Panov & 
Dzeroski, 2007) are all predictive learning methods based 
on the ensemble learning principle for which the ensem¬ 
ble is built from the provided data set ^ and the weights 
are typically taken to be equal. In this paper, we focus on 
learning tasks involving high dimension low sample size 
(HDLSS) data, and we further zero-in on those data sets 
for which the number of explanatory variables p is sub¬ 
stantially larger than the sample size n. As our main con¬ 
tribution in this paper, we introduce, develop and apply a 
new adaptation of the theme of random subspace learn¬ 
ing (Ho, 1998) using the traditional multiple linear regres¬ 


sion (MLR) model as our base learner in regression and 
the generalized linear model (GLM) as a base learner in 
classification. Some applications by nature posses few in¬ 
stances (small n) with large number of features (p ^ n) 
such as fMRI (Kuncheva et al., 2010) and DNA microar¬ 
rays (Bertoni et al., 2005) data sets. It is hard for a tradi¬ 
tional (conventional) algorithm to build a regression model, 
or to classify the data set when it possesses a very small in¬ 
stances to features ratio. The prediction problem becomes 
even more difficult when this huge number of features cor¬ 
related are highly correlated, or irrelevant for the task of 
building such a model, as we will show later in this paper. 
Therefore, we harness the power of our proposed adaptive 
subspace learning technique to guide the choice/selection 
of good candidate features from the data set, and there¬ 
fore select the best base learners, and ultimately the ensem¬ 
ble yielding the lowest possible prediction error. In most 
typical random subspace learning algorithms, the features 
are selected according to an equally likely scheme. The 
question then arises as to whether one can devise a better 
scheme to choose the candidate features for efficiently with 
some predictive benefits. On the other hand, it is interest¬ 
ing to assess the accuracy of our proposed algorithm under 
different levels of the correlation of the features. The an¬ 
swer to this question constitutes one of the central aspect 
of our proposed method, in the sense we explore a variety 
of weighting scheme for choosing the features, most of them 
(the schemes) based on statistical measures of relationship 
between the response variable and each explanatory vari¬ 
able. As the computational section will reveal, the weight¬ 
ing schemes proposed here lead to a substantially improve¬ 
ment in predictive performance of our method over random 
forest on all but one data set, arguably due to the fact that 
our method because it leverages the accuracy of the learn¬ 
ing algorithm through selecting many good models (since 
the weighting scheme allows good variables to be selected 
more often and therefore leads to near optimal base learn¬ 
ers). 

2. Related Work 

Traditionally, in a prediction problem, a single model is 
built based on the training set and the prediction is decided 
based solely on this single fitted model. However, in bag¬ 
ging, bootstrap samples are taken from the data set, then, 
for each instance, the model is fitted. Finally, the predic¬ 
tion is made based on the average of all bagged models. 
Mathematically, the prediction accuracy for the constructed 
model using bagging outperforms the traditional model and 
in the worst case it has the same performance. However, it 
must be said that it depends on the stability of the modeling 
procedure. It turns out that bagging reduces the variance 
without affecting the bias, thereby leading to an overall re¬ 
duction in prediction error, and hence its great appeal. Any 
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set of predictive models can be used as an ensemble in the 
sense defined earlier. There are many ensemble learning 
approaches. These approaches could be categorized into 
four classes; (1) algorithms that use heterogeneous predic¬ 
tive models such as stacking (Wolpert, 1992). (2) algo¬ 
rithms that manipulate the instances of the data sets such 
as bagging (Breiman, 1996), boosting (Freund & Schapire, 
1996), random forests (Breiman, 2001), and bagging with 
subspaces (Panov & Dzeroski, 2007). (3) algorithms that 
maniplulate the features of the data sets such as random 
forests (Breiman, 2001), random subspaces (Ho, 1998), 
and bagging with subspaces (Panov & Dzeroski, 2007). (4) 
algorithms that manipulate the learning algorithm such as 
random forests (Breiman, 2001), neural networks ensem¬ 
ble (Hansen & Salamon, 1990), and extra-trees ensemble 
(Geurts et al., 2006). Since our proposed algorithm manip¬ 
ulates both the instances and features of the data sets, we 
will focus on the algorithms in the second and third cate¬ 
gories (Breiman, 1996; 2001; Panov & Dzeroski, 2007; Ho, 
1998). 

Bagging (Breiman, 1996), or bootstrap aggregating is an 
ensemble learning method that generates multiple predic¬ 
tive models. These models are based on performing boot¬ 
strap replicates of the learning (training) data set and utiliz¬ 
ing from each replicate to build a separate predictive model. 
The bootstrap sample is attained through randomly (uni¬ 
formly) sampling with replacement from instances of the 
training data set. The decision is made based on averag¬ 
ing the predictor classifiers in regression task and taking 
the majority vote in classification task. Bagging tend to de¬ 
crease the variance and keeps the bias as in the case of a 
single classifier. The bagging accuracy increases when the 
applied learner is unstable, which means that for any small 
fluctuation on the training data set causes large impact on 
the test data set such as trees (Breiman, 1996). Random 
forests (Breiman, 2001), is an ensemble learning method 
that averages the prediction results from multiple indepen¬ 
dent predictor (tree) models. It also performs bootstrap 
replicates, like bagging (Breiman, 1996), to construct dif¬ 
ferent predictors. For each node of the tree, randomly se¬ 
lecting subset of the attributes. It is considered to improve 
over bagging through de-correlating the trees. Choose the 
best attribute from the selected subset. As (Denil et al., 
2014) mentions that when building a random tree, there are 
three issues that should be decided in advance; (1) the leafs 
splitting method, (2) the type of predictor, and 3- the ran¬ 
domness method. Random subspace learning (Ho, 1998), 
is an ensemble learning method that constructs base mod¬ 
els based on different features. It chooses a subset of fea¬ 
tures and then learns the base model depending only on 
these features. The random subspaces reaches the high¬ 
est accuracy when the number of features is large as well 
as the number of instances. In addition, it performs good 


when there are redundant features on the data set. Bag¬ 
ging subspaces (Panov & Dzeroski, 2007), is an ensemble 
learning method that combines both the bagging (Breiman, 
1996) and random subspaces (Ho, 1998) learning methods. 
It generates a bootstrap replicates of the training data set, in 
the same way as bagging. Then, it randomly chooses a sub¬ 
set from the features, in the same manner as random sub¬ 
spaces. It outperforms the bagging and random subspaces. 
Also, it is found to yield the same performance as random 
forests in case of using decision tree as a base learner. In 
the simulation part of this paper, we aim to answer the fol¬ 
lowing research questions: (1) Is the performance of the 
adaptive random subspace learning (RSSL) better than the 
performance of single classifiers? (2) What is the perfor¬ 
mance of the adaptive RSSL compared to the most widely 
used classifier ensembles? (3) Is there a theoretical expla¬ 
nation as to why adaptive RSSL works well for most of 
the simulated and real-life data sets? (4) How does adap¬ 
tive RSSL perform on different parameter settings and with 
various percentages of the instance-to-feature ratio (IFR)? 
(5) How does the correlation between features affect the 
predictive performance of adaptive RSSL? 

3. Adaptive RSSL 

In this section, we present an adaptive random subspace 
learning algorithm for the prediction problem. We start 
with the formulation of the problem, followed by our sug¬ 
gested solution (proposed algorithm) to tackle (handle) it. 
A crucial step of assessing the candidate features for build¬ 
ing the models is explained in detail. Finally, we elucidate 
the strength of the new algorithm, from a theoretical per¬ 
spective. 

3.1. Problem formulation 

As we said earlier our proposed method belongs to the cat¬ 
egory of random subspace learning where each base learner 
is constructed using a bootstrap sample and a subset of 
the original p features. The main difference here is that 
we use base learners that are typically considered not to 
lead to any improvement when aggregated, and we also se¬ 
lect features using weighting schemes that inspired for the 
strength of the relationship between each feature and the 
response (target). Each base learner is driven by the sub¬ 
set of } C {1,2, •• • ,p} of d variables of predic¬ 

tors that are randomly select to build it, and the subsam¬ 
ple drawn with replacement from For notational 
convenience, we use vectors of indicator variables to de¬ 
note these two important quantities. The sample indicator 
8^^^ = • •, 5n'^) G {0,1}", where 

^(0 _ f 1 ifZjG^W 
' 1 0 otherwise 
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The variable indicator € {0,1}'’, 

where 

1 if 

^ } 0 otherwise 

The estimator of the fth base learner can therefore fully and 
unambiguously denoted by gO) yd) which we refer 
to as for notational simplicity. Each is chosen 

according to one of the weighting schemes. To increase 
the strength of the developed algorithm, we introduce a 
weighting scheme procedure to select the important fea¬ 
tures, which facilitates building a proper model and lever¬ 
age the prediction accuracy. Our weighting schemes are 

• Correlation coefficient; We measure the strength of 
the association between each feature vector and the 
response (target), and take the square of the resulting 
sample correlation 

• F-statistic: For classification tasks especially, we use 
the observed F-statistic resulting from the analysis of 
variance with xj as the response and the class label the 
treatment group. 

Using the ensemble ^ = 1, • • • we form the en¬ 

semble estimator of class Membership as 

/(^)(x) = argmax jg | , 

and the ensemble estimator of regression response as 

/'«(»•) = 

^ 1=1 

4. Experiments 

We used a collection of simulated and real-world data sets 
for our experiments. In addition, we used real-world data 
sets from previous papers, which aim to solve the same 
problem, for comparison purpose. We report the mean 
square error (MSE) for each individual algorithm and task 
purposes, i.e., regression, or classification. 

4.1. Simulated data sets 

We designed our artificial data sets to fit six scenarios based 
on the factors, which are the dimensionality of the data 
(number of features), the number of sample size ((number 
of instances), and the correlation of the data. 

4.2. Real data sets 

We benefit from the public repository of the UCI Univer¬ 
sity real-life data sets in our paper. For the purposes of 


consistency and completeness, we choose the real data sets 
that carries different characteristics in terms of the number 
of instances and the number of features along with variety 
of applications. The real data sets can be represented based 
on the task as follows: 



0 100 200 300 400 500 


Figure 1. Prior Feature Importance: A representative simulation 
results for regression analysis on synthetic dataset of scenario 
with number of instances n=25, number of features p=500, cor¬ 
relation coefficient p=0.5, number of leamers=450, and number 
of replications=100. 


To elucidate the performance of our developed model, we 
compare the accuracy of the RSSF with random forest and 
... on the same real data sets they used before. 

5. Discussion 

As revealed (experienced) from our experiments on syn¬ 
thetic data sets that when the number of selected features 
is higher than 15-20 (for our particular dataset) yields en¬ 
semble classifiers that are highly accurate and stable. The 
reason for this is that only if the number of voters is Olarge 
enough© does the random process of attribute selection 
yield suKcient number of qualitatively different classifiers 
that ensure high accuracy and stability of the ensemble. 

how many bootstrap replications are useful? The evidence 
both experimental and theoretical is that bagging can push 
a good but unstable procedure a significant step towards 
optimality, why the training set in real dataset was chosen 
to be large and in simulated dataset the test set used to be 
large? The bootstrap sample was repeated 50 times. The 
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Table 1. Regression 
data sets. 

Analysis: Mean Square Error (MSE) for different machine learning algorithms on various scenarios 

of synthetic 

Weighting 

N 

P 

P 

MLR 

Uniform MLR 

Adaptive MLR 

RE 

Better? 


200 

25 

0.05 

5.69±0.89 

14.50±2.63 

4.60±0.706 

9.81±1.86 

v/ 


200 

25 

0.5 

4.78±0.81 

11.67±2.55 

4.77±0.94 

8.46±1.97 

v/ 


25 

200 

0.05 

974.37±5.e3 

18.35±6.92 

8.10±3.86 

18.56±7.24 

v/ 

C--LJK.K.J 2 /J-.A 1 lUlN 

25 

200 

0.5 

5.e3±5.e4 

18.83±8.72 

8.27±5.24 

18.18±8.65 

v/ 


50 

1000 

0.05 

2.e4±Le5 

28.36±1L51 

12.38±5.91 

27.92±11.78 

v/ 


1000 

50 

0.05 

4.66±0.34 

16.62±1.37 

4.33±0.33 

6.73±0.62 

v/ 


200 

25 

0.05 

5.04±0.79 

14.42±2.67 

4.48±0.74 

8.75±1.76 

v/ 


200 

25 

0.5 

4.49±0.76 

12.06±2.04 

5.51±1.09 

8.33±1.59 

v/ 


25 

200 

0.05 

3.e4±2.e5 

17.77±9.15 

5.81±4.10 

15.81±8.55 

v/ 

F-STATISTICS 

25 

200 

0.5 

1.e4±Le5 

23.09±16.06 

12.53±10.27 

24.11±16.31 

v/ 


50 

1000 

0.05 

4.e5±3.e6 

16.65±5.38 

7.65±2.83 

15.54±5.31 

v/ 


1000 

50 

0.05 

4.19±0.33 

15.97±1.15 

3.90±0.30 

6.24±0.55 

v/ 


Average Test Error: Logarithmic Scaie 
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Figure 2. Prior Feature Importance: A representative simulation 
results for classification analysis on real dataset of Lymphoma 
disease. 


random division of the data is repeated 100 times. Choos¬ 
ing between these two strategies is not an easy task since 
it involves a trade-off between bias and estimation variance 
over the forecast horizon. 

Even though that our developed adaptive RSSL algorithm 
outperforms many classifier ensembles. It has limitations 
where this new algorithm can not deal with data set that has 
categorical features. Instead it necessities to encode these 
features numerically. Also, the algorithm is not designed 



MLR U-MLR A-MLR Random Forest 


Figure 3. A representative results of synthetic dataset of scenario 
with number of instances n=50, number of features p=1000, cor¬ 
relation coefficient p=0.05, number of learners=450, and number 
of replications=100. We used the correlation weighting scheme 
for regression analysis on logarithmic scale. 


to classify data sets with multiple classes. Moreover, the 
adaptive RSSL algorithms sometimes fails to select the op¬ 
timal feature subsets? 

6. Conclusion and Future Work 

We presented a detailed quantitative analysis of the adap¬ 
tive RSSL algorithm for an ensemble prediction problem. 
We support this analysis with deep theoretical (mathemat¬ 
ical) explanation (formulation). The key important issues 
for the developed algorithm resides on four fundamental 
factors: generalization, flexibility, speed, and accuracy. We 
will explain each of these four factors. We present a rigor¬ 
ous theoretical justification of our propose algorithm. Lor 
now, we choose fixed number of attribute subset. However, 
the algorithm should evaluated based on the performance 
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Table 2. Regression Analysis: Mean Square Error (MSE) for different machine learning algorithms on real data sets. 


Data Set 

Weighting 

MLR 

Uni. MLR 

Adap. MLR 

RF 

Better? 

BodyFat 

CORRELATION 

17.41±2.69 

23.59±3.71 

19.25±3.06 

19.72±3.18 

X 

F-statistics 

17.06±2.50 

23.07±3.46 

17.46±2.65 

19.51±2.99 

X 

Attitude 

CORRELATION 

74.12±32.06 

80.35±34.40 

58.49±20.21 

88.72±35.97 

V 

F-statistics 

75.19±36.63 

74.71±33.17 

5L84±15.19 

82.21±35.58 

y/ 

Cement 

CORRELATION 

10.76±7.25 

NA 

19.92±15.98 

75.91±56.05 

X 

F-statistics 

11.07±8.55 

NA 

24.27±18.27 

62.20±46.53 

X 

Diabetes 1 

CORRELATION 

2998.13±322.37 

3522.30±31L81 

3165.74±300.86 

3203.94±311.94 

X 

F-statistics 

2988.32±341.20 

3533.45±375.38 

3133.60±324.75 

3214.11±318.6931 

X 

Diabetes 2 

CORRELATION 

3916.98±782.35 

4244.00±390.29 

3016.54±285.89 

3266.50±324.82 


F-statistics 

3889.00±679.55 

4306.76±419.66 

3076.77±338.08 

3326.28±382.37 

y 

Longley 

CORRELATION 

0.21±0.13 

0.62±0.36 

0.49±0.29 

L54±0.92 

X 

F-statistics 

0.22±0.13 

0.66±0.42 

0.49±0.29 

L63±1.04 

X 

Prestige 

CORRELATION 

66.68±15.31 

73.32±14.77 

64.87±13.93 

55.96ill.83 

X 

F-statistics 

65.77±15.96 

72.33±16.64 

63.27±14.71 

56.02il2.66 

X 


Table 3. Classification Analysis: MisClassification Rate (MCR) for different machine learning algorithms on various scenarios of simu¬ 
lated data sets. 


Weighting 

N 

P 

P 

GLM 

Uni. GLM 

Adap. GLM 

RF 

Better? 


200 

25 

0.05 

0.070i0.033 

0.486i0.172 

0.071i0.032 

0.101i0.053 

V 


200 

25 

0.5 

0.140i0.045 

0.498i0.221 

0.138i0.043 

0.136i0.058 

X 

F-statistics 

50 

200 

0.05 

0.102i0.093 

0.673i0.123 

0.100i0.092 

0.320i0.103 

V 


50 

200 

0.5 

0.058i0.141 

0.346i0.346 

0.049i0.121 

0.178i0.188 

V 


50 

1000 

0.05 

0.033i0.064 

0.522i0.158 

0.034i0.062 

0.409i0.114 

V 


1000 

50 

0.05 

0.130i0.019 

0.643i0.028 

0.130i0.019 

0.167i0.024 

V 


(accuracy) to determine the appropriate number (dimen¬ 
sion) for single classifiers used in the ensemble learning. 
In addition, the adaptive RSSL algorithm is tested on a rel¬ 
atively small data sets. Our next step will be applying the 
developed algorithm on a big data sets. 

Also, we show that the adaptive RSSL performs better 
than widely used ensemble algorithms even with the de¬ 
pendence of feature subsets. 

Computational issues. 
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Table 4. Classification Analysis: MisClassification Rate (MCR) for different machine learning algorithms on real data sets. 


Data Set 

W. S. 

GLM 

Uni. GLM 

Adap. GLM 

RF 

Better? 

Diabetes in Pima 

F-stat 

0.274±0.071 

0.249±0.051 

0.255±0.051 

0.269±0.050 

x/ 

Prostate Cancer 

F-stat 

0.425±0.113 

0.355±0.093 

0.332±0.094 

0.343±0.098 

x/ 

Golub Leukemia 

F-stat 

0.427± 

0.023± 

0.021± 

0.023± 

V 

Diabetes 

F-stat 

0.034±0.031 

0.068±0.039 

0.038±0.034 

0.031±0.029 

X 

Lymphoma 

F-stat 

0.248±0.065 

0.057±0.034 

0.046±0.029 

0.082±0.046 

x/ 

Lung Cancer 

F-stat 

0.113±0.051 

0.038±0.023 

0.037±0.024 

0.051±0.030 

V 

Colon Cancer 

F-stat 

0.296±0.124 

0.168±0.095 

0.124±0.074 

0.199±0.106 

x/ 


Average Test Error: Original Scale 



MLR U-MLR A-MLR Random Forest 
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Figure 4. A representative results of Diabetes interaction real 
dataset with correlation weighting scheme for regression analy¬ 
sis on original scale. 
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Figure 6. A representative results of the Diabetes in Pima Indian 
Women real dataset with F-statistics weighting scheme for classi¬ 
fication analysis. 

Table 5. Summary of the regression and classification real data 
sets. 
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